Understanding human activity is very challenging even with the recentlydeveloped 3D/depth sensors. To solve this problem, this work investigates anovel deep structured model, which adaptively decomposes an activity instanceinto temporal parts using the convolutional neural networks (CNNs). Our modeladvances the traditional deep learning approaches in two aspects. First, { weincorporate latent temporal structure into the deep model, accounting for largetemporal variations of diverse human activities. In particular, we utilize thelatent variables to decompose the input activity into a number of temporallysegmented sub-activities, and accordingly feed them into the parts (i.e.sub-networks) of the deep architecture}. Second, we incorporate a radius-marginbound as a regularization term into our deep model, which effectively improvesthe generalization performance for classification. For model training, wepropose a principled learning algorithm that iteratively (i) discovers theoptimal latent variables (i.e. the ways of activity decomposition) for alltraining instances, (ii) { updates the classifiers} based on the generatedfeatures, and (iii) updates the parameters of multi-layer neural networks. Inthe experiments, our approach is validated on several complex scenarios forhuman activity recognition and demonstrates superior performances over otherstate-of-the-art approaches.
展开▼